130 research outputs found
Translating Neuralese
Several approaches have recently been proposed for learning decentralized
deep multiagent policies that coordinate via a differentiable communication
channel. While these policies are effective for many tasks, interpretation of
their induced communication strategies has remained a challenge. Here we
propose to interpret agents' messages by translating them. Unlike in typical
machine translation problems, we have no parallel data to learn from. Instead
we develop a translation model based on the insight that agent messages and
natural language strings mean the same thing if they induce the same belief
about the world in a listener. We present theoretical guarantees and empirical
evidence that our approach preserves both the semantics and pragmatics of
messages by ensuring that players communicating through a translation layer do
not suffer a substantial loss in reward relative to players with a common
language.Comment: Fixes typos and cleans ups some model presentation detail
The Assistive Multi-Armed Bandit
Learning preferences implicit in the choices humans make is a well studied
problem in both economics and computer science. However, most work makes the
assumption that humans are acting (noisily) optimally with respect to their
preferences. Such approaches can fail when people are themselves learning about
what they want. In this work, we introduce the assistive multi-armed bandit,
where a robot assists a human playing a bandit task to maximize cumulative
reward. In this problem, the human does not know the reward function but can
learn it through the rewards received from arm pulls; the robot only observes
which arms the human pulls but not the reward associated with each pull. We
offer sufficient and necessary conditions for successfully assisting the human
in this framework. Surprisingly, better human performance in isolation does not
necessarily lead to better performance when assisted by the robot: a human
policy can do better by effectively communicating its observed rewards to the
robot. We conduct proof-of-concept experiments that support these results. We
see this work as contributing towards a theory behind algorithms for
human-robot interaction.Comment: Accepted to HRI 201
Cost Functions for Robot Motion Style
We focus on autonomously generating robot motion for day to day physical
tasks that is expressive of a certain style or emotion. Because we seek
generalization across task instances and task types, we propose to capture
style via cost functions that the robot can use to augment its nominal task
cost and task constraints in a trajectory optimization process. We compare two
approaches to representing such cost functions: a weighted linear combination
of hand-designed features, and a neural network parameterization operating on
raw trajectory input. For each cost type, we learn weights for each style from
user feedback. We contrast these approaches to a nominal motion across
different tasks and for different styles in a user study, and find that they
both perform on par with each other, and significantly outperform the baseline.
Each approach has its advantages: featurized costs require learning fewer
parameters and can perform better on some styles, but neural network
representations do not require expert knowledge to design features and could
even learn more complex, nuanced costs than an expert can easily design
- …